Introduction
In addition to obvious networks, we can also use graph theory to think though some non-obvious application. One place this often crops up in planning is to think about adjacency/proximity as a graph. See my post on employment centers and paper on delineating areas.
In this post, I will demonstrate how to use networks for other spatial analysis. We can think of polygons as nodes and two nodes are connected if they share some part of the boundary
Acquire data
I also want to demonstrate Kyle Walker’s excellent tigris and tidycensus packages that allow us to conveniently download Census data.
First you need to acquire an API key from census.
We will use the API key to access American Community Survey (ACS) data.The American Community Survey (ACS) is an ongoing survey by the U.S. Census Bureau. It regularly gathers information previously contained only in the long form of the decennial census, such as ancestry, educational attainment, income, language proficiency, migration, disability, employment, and housing characteristics. About 3.5 million people are surveyed every year and multi-year summaries are provided at different levels of geography (not at block level).
library(tidycensus)
library(sf)
library(tidyverse)
library(mapview)
library(RColorBrewer)
library(igraph)
library(tigris)
census_api_key(census_apikey)
varnames <- load_variables(2016, "acs5", cache = TRUE)
write_csv(varnames, path="/Users/kaza/Desktop/acs_varnames.csv")
It is useful to write the variables and their descriptions into an external file that you can use in other programs and this is usually handy for finding variables of interest. In this post, I am going to use poverty rate and unemployment rate as examples. The choice of these variables is arbitrary.
Poverty rate
For example, to find the poverty rate in Carolinas’ census tracts, you can use B17001_01 (persons for whom poverty level is tracked) and B17001_002 (persons whose income in the last 12 months was below poverty level).
NC_SC <- c('37', '45') # FIPS code for NC and SC.
# Download CBSAs and MSAs for reference.
cbsa <- core_based_statistical_areas() %>% st_as_sf() # Download Core Based Statistical Areas file
states_shps <- states() %>%
st_as_sf() %>%
filter(GEOID %in% NC_SC) # Limit to the Carolinas
MSA <- cbsa[states_shps,] %>% # Filter CBSA that is intersected/touched by the state boundary.
filter(LSAD=="M1") # Limit to Metropolitan areas
pov <- reduce(
map(NC_SC, function(x) {
get_acs(geography = "tract", variables = "B17001_002", summary_var = 'B17001_001',
state = x, geometry = TRUE, year = 2016)
}),
rbind
) ## read up on these map and reduce functions in purrr. They are key to functional programming paradigm
nrow(pov)
# [1] 3298
pov_rate <- pov %>%
rename(population = summary_est) %>%
filter(population>0)%>%
mutate(pov_rate = estimate/population) %>%
select(GEOID, NAME, population, pov_rate)
library(mapview)
mapview(pov_rate,
zcol=c('pov_rate'),
col.regions = brewer.pal(5, "Reds"),
at=quantile(pov_rate$pov_rate, seq(0,1,.2)),
lwd=0
)
Unemployment rate
Unemployment rate is trickier, because it needs to track both the labour force as well as unemployed persons. It turns out that ACS only provides this data for males and females separtely and by differnt age categories. So we need to assemble them and combine them.
# variables for labour force for age groups 16-64
lf_m <- paste("B23001_", formatC(seq(4,67,7), width=3, flag="0"), "E", sep="")
lf_f <- paste("B23001_", formatC(seq(90,153,7), width=3, flag="0"), "E", sep="")
lf <- reduce(
map(NC_SC, function(x){
get_acs(geography='tract', variables = c(lf_m, lf_f), state=x, year=2016)
}),
rbind
)
# variables for unemployed for age groups 16-64
unemp_m <- paste("B23001_", formatC(seq(8,71,7), width=3, flag="0"), "E", sep="")
unemp_f <- paste("B23001_", formatC(seq(94,157,7), width=3, flag="0"), "E", sep="")
unemp <- reduce(
map(NC_SC, function(x){
get_acs(geography='tract', variables = c(unemp_m, unemp_f), state=x, year=2016)
}),
rbind
)
lf_t <- lf %>%
group_by(GEOID) %>%
summarize(lf_est = sum(estimate, na.rm=T))
unemp_t <- unemp %>%
group_by(GEOID) %>%
summarize(unemp_est = sum(estimate, na.rm=T))
unemp_rate <- merge(lf_t, unemp_t, by='GEOID') %>%
filter(lf_est >0) %>%
mutate(unemp_rate = unemp_est/lf_est)
Exercise
- Compute unemployment rate for more meaningful classes (Women, Youth etc.) and visualise it. Are there differences in spatial distribution?
We can arbitarily define distress to be a combination of poverty and unemployment. In this instance, I am picking 20% and 10% as the threshold respectively.
tract_stats <- merge(pov_rate, unemp_rate, by='GEOID')
distressed_tracts <- tract_stats %>%
filter(unemp_rate > .10 & pov_rate > .20)
mapview(distressed_tracts, legend = FALSE)
Space as a topological network
There are reasons to presume that clusters of distress is more problematic than isolated distress. We could use the contiguity of distressed tracts to our advantage to figure where these clusters are and how they are shaped.
It turns out that the contiguity calculations in sf are still a little slower than older packages such as spdep. So I will convert the simple features into a Spatial object and use older packages.
library(rgeos)
library(spdep)
dt <- as(distressed_tracts, "Spatial") # Convert to spatial
temp1 <- gUnarySTRtreeQuery(dt) # Construct list of polygons that are likely to intersect/touch by first looking at bounding boxes.
dt_nb <- poly2nb(dt, queen=FALSE, foundInBox=temp1, row.names = dt$GEOID) # Construct a neighborhood object
plot(dt)
plot(dt_nb, coordinates(dt), col='red', points=F, add=T) # Quickly visualise the graph

distressed_tracts_graph <- dt_nb %>%
nb2mat(zero.policy=TRUE, style="B") %>% # Create a Binary Adjacency Matrix.
graph.adjacency(mode='undirected', add.rownames=NULL) # construct an undirected graph from the adjacency matrix.
cl <- clusters(distressed_tracts_graph) # This decomposes the graph into connected subgraphs.
sum(dt$GEOID!= names(cl$membership)) # this is 0. So we can safely assign the clustermembersip variable by rows rather than doing a merge.
# [1] 0
dt$clustermembership <- cl$membership
### Visualisation
# Adding the MSA boundaries for effect to see if the deprivation is primiarly a metropolitan phenomenon or outside the metropolitan areas
pal <- colorRampPalette(brewer.pal(8, "Dark2"))
numColors <- cl$membership %>% unique() %>% length()
dt %>% st_as_sf() %>%
mapview(zcol="clustermembership",
col.regions=pal(numColors),
legend = FALSE,
lwd = 0
) +
mapview(MSA, col.regions = "gray95", color='Red', lwd=2, alpha=.4, legend = FALSE)
Clusters function decomposes the graph in subgraphs, i.e. nodes belong to a subgraph, if there is a path between them. If there is not, they belong to different subgraphs.
ggplot() +
geom_histogram(aes(cl$csize))

From the above histogram it is relatively obvious that the many (64.67 %) of the distressed census tracts are are relatively isolated (cluster size is \(\le\) 2). Many of them are even in clusters of size 1. We could see if there is a geograhic pattern in these isolated clusters.
smallclusters <- (1:cl$no)[cl$csize <=2]
mapview(dt[dt$clustermembership %in% smallclusters,], legend = FALSE) + # Filter all tracts that are within 'small' clusters of deprivation.
mapview(MSA, col.regions = "gray95", color='Red', lwd=2, alpha=.4, legend = FALSE)